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Abstract 



We give a functional specification of the Alpha AXP architecture with 
special emphasis on the Alpha Shared Memory Model. We keep the speci- 
fication as abstract as possible and modular in the sense that we provide an 
independent description of the processors and the memory. We show how to 
handle a number of critical aspects of the Alpha architecture within the func- 
tional model, such as the specification of basic assumptions about the behav- 
ior of the processors and the exclusion of causal loops. We use the model for 
specifying the notion of lookahead and shortcut optimization for the behav- 
iors of the processors. This allows us to define the concept of correct pro- 
cessor behavior by using the conservative sequential behavior as a reference. 
Finally, we extend the model to the constructs for synchronization in the Al- 
pha architecture and include the instructions "read locked" as well as "store 
conditional". 
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1 Introduction 



The Alpha AXP architecture is a RISC architecture that was designed for high per- 
formance and longevity (see [AARM 92]). A major design goal was to avoid any 
elements that would become limitations during a 15-25 year design horizon. The ar- 
chitecture allows and supports a factor-of-1000 increase in performance. It allows 
multiple instruction streams and the execution of many instructions per clock cy- 
cle, as well as multiple data streams and instruction stream memory management. 
The interaction between the processors and the memory is highly underspecified to 
allow implementations to be very flexible in adopting future speed-up techniques. 

We give a functional specification of the Alpha architecture and in particular of 
the Alpha Shared Memory using a data flow model. The "nondeterminism" in the 
architecture is modeled by underspecification. 

The mathematical basis for the functional model is "streams" and "stream pro- 
cessing functions" (for a short introduction see the appendix B or, for more details, 
[Broy 90]). Every device in the Alpha architecture is described by a logical formula 
that characterizes the stream processing functions representing the behavior of the 
components and the streams flowing between the components. This way the be- 
havior of the processors and the behavior of the memory are described by separate 
logical formulas. The behavior of the overall system is obtained by a composition 
of the formulas describing the behaviors of the subsystems. This structure supports 
independent reasoning about the different devices. It is possible to reason about the 
behavior of the complete system by using the specification of its components. 

There are numerous papers that deal with sequential consistency, serializabil- 
ity and related aspects of concurrent access to memory and data base systems (see 
[Shasha, Smir 88]). For modern machine architectures with caches and concurrent 
execution of processors difficult issues of programming arise (see, for example, 
[Attiya, Friedmann 94]). Our main motivation is a simple and powerful model of 
such architectures. It also allows us to clarify a number of issues of high practical 
relevance, such as the exclusion of causal loops and the definition of correct opti- 
mization. These issues have not been addressed sufficiently in the literature, so far. 

The paper is structured as follows. Section 2 describes the basic structure of the 
model of the Alpha AXP architecture and its components. In sections 3 and 4 we 
give the description of the behaviors of the memory and of the processors. Section 
5 summarizes the description. The remaining part of the paper shows how to make 
use of the description. In section 6 we study the exclusion of causal loops and ana- 
lyze basic assumptions about the behavior of processors. We characterize speed-ups 
of the processors by advanced lookahead when issuing memory requests. We define 
the concept of lookahead and shortcut optimization of processor behaviors. This al- 
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lows us to define the correctness for processor behavior using the strictly sequential 
behavior as the reference. Roughly speaking, the behavior of a processor is correct 
if it is an optimization of the behavior of a strictly sequential processor. We prove 
that for such optimizations the results of programs that run without shared mem- 
ory are effectively equivalent to the results that we obtain for the processors with 
strictly sequential behavior and a memory that does not reorder memory accesses. 

In section 7 we extend the model to memory access with locked loads and con- 
ditional stores. We show for a simple example how to prove properties of programs 
using synchronization protocols. 

In appendix A we give a more liberal scheduling strategy for the memory re- 
quests of the processors than the one described in [AARM 92]. In appendix B we 
include the essentials of the mathematics of the functional system model. 

2 The Model 

In this section we describe the structure of the mathematical model that we suggest 
for the Alpha architecture. 

2.1 Basic Components 

The Alpha architecture consists of a number of processors and a memory 
ory consists of a set of locations in which data can be stored. By PRC 
the set of processors, by LOC we denote the set of locations, by DATA 
the set of data values. The set DATA includes instructions. 

2.2 Actions 

In the Alpha architecture the relevant actions for the interaction between the pro- 
cessors and the memory are read and write actions, instruction fetches, and memory 
and instruction barriers. The table given in Figure 1 lists the syntax of the actions 
and introduces some useful selector functions. 

However, we prefer to think about the execution of these actions not as one 
atomic step but in terms of an interaction between the processors and the memory. 
To execute an action, the processor issues a request (similarly to a procedure call in 
a conventional programming language) and the memory responds to it by a mem- 
ory response (similarly to a returned result for a procedure call). Following this 
concept we decompose each of the actions into memory requests and memory re- 
sponses. Processors issue memory requests and receive read response messages. 
The memory receives memory requests and issues read response messages. 



The mem- 
we denote 
we denote 
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Action 


Syntax 


processor 


location 


data 


Read action 


P:R(x,a) 


P 


X 


a 


Write action 


P :W(x, a) 


P 


X 


a 


Instruction fetch 


P:I (x,b) 


P 


X 


b 


Memory barrier 


P : MB 


P 






Instruction memory barrier 


P : 1MB 


P 







Figure 1 : Table of memory actions and selector functions processor, location and 
data 



Action name 


Syntax 


Memory Request 


Response 


Read action 


P:R(x,a) 


P :R?x 


P:R(x,a) 


Write action 


P :W (x, a) 


P :W(x, a) 




Instruction fetch 


P:I (x,b) 


P : I?x 


P:I (x,b) 


Memory barrier 


P : MB 


P : MB 




Instruction memory barrier 


P : 1MB 


P : 1MB 





Figure 2: Table of actions and their split into requests and responses 



Let P be a processor, x be a location, and a be some data element. A memory 
request is one of the following: 

- a write request, represented by the action P : W ( x , a ) ; 

- a read request, represented by the action P : R?x; 

- a memory barrier request, represented by the action P : MB; 

- an instruction fetch request, represented by the action P : I ? x ; 

- or an instruction memory barrier request, represented by the action P : 1MB. 

By REQ we denote the set of memory requests. 

The memory replies to a read memory request by a read response message. A 
read response message is represented by P : R ( x , a ) . The memory replies to an 
instruction fetch memory request by an instruction fetch response message. An in- 
struction fetch response message is represented by P : I ( x , a ) . By RFR we denote 
the set of read response and instruction fetch response messages. 

The table given in Figure 2 summarizes the decomposition of actions into mem- 
ory requests and responses to them. For write actions and barrier requests, responses 
are not needed. 
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Figure 3: Data flow graph of the Alpha architecture 



The decomposition of actions into requests and responses has a number of ad- 
vantages. This way we are able to distinguish between the issue order of memory 
requests, determined by the processors, the access order of memory requests, deter- 
mined by the memory, and the reception order of the responses by the processors. 
This allows us to model the scheduling discipline of memory access more explicitly 
and modularize the model more effectively. 



2.3 The Model of the Architecture 

We model the Alpha architecture by a data flow model as given in Figure 3. In the 
data flow model the processors and the memory are independent units that commu- 
nicate asynchronously via message exchange. The processors are connected to the 
memory by channels. The channels are denoted by cv, cs P , crp , and ct where P 
is a processor. 

Each processor sends a stream of memory requests to the memory and receives 
a stream of read response messages from the memory. In a computation of the Al- 
pha architecture the behavior of each processor P e PRC is modeled by a stream 
processing function 

f P : RFR M -> REQ a 

A stream processing function is a prefix continuous function on streams. A short 
introduction to the mathematics of streams and stream processing functions is given 
in appendix B. There we also introduce the notation we are using throughout the 
paper for writing logical formulas for streams and stream processing functions. 
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In a computation a stream is associated with each of the channels cv, csp, crp, 
and ct. These streams will be denoted by cv, cs p, crp , and ct respectively. 

The memory receives the streams of memory requests from the processors and 
sends back in response a stream of read response messages. Mathematically the 
memory is modeled by a stream processing function 

md : (PRC -> REQ W ) -> RFR W 

Since each response message is labeled by the identifier of the processor that has 
requested it, the stream of read response messages produced by the memory can 
easily be split into individual streams of responses for each of the processors. 

The Alpha architecture is modeled as follows: for every processor P we repre- 
sent the history of messages exchanged between the processor and the memory by 
the input stream cs p e RFR m and the output stream cr p e REQ w of the processor. 
Of course, the stream cr p is the result of the processor function fp applied to the 
stream csp of memory responses. This is expressed mathematically as: 

crp = fp{cs P ) 

By cr we denote the mapping that associates with each processor its request stream 
produced by the processors in PRC. By cs we denote the mapping that associates 
with each processor its response stream produced by memory for the processor in 
PRC . Formally cr is a mapping that associates a stream of memory requests with 
every processor: 

cr : PRC -> REQ a 

and cs is a mapping that associates the input stream of memory request responses 
with every processor: 

cs : PRC -> RFR M 

We specify the history of messages between the memory and the processors by a 
stream cv € RFR w that represents the stream of all read request responses produced 
by the memory. We do not assume any responses for write or barrier requests. 

By the split component called D in the data flow graph given in Figure 3 we 
obtain the input streams cs p for each of the processors from cv. 

In the functional approach, a distributed system is modeled by associating a set 
of monotonic stream-processing functions with each component. An instance of 
the behavior of the system, a computation, is obtained as follows: for each compo- 
nent one function is chosen out of the set of stream processing functions modelling 
that component. By these functions we associate a stream with each channel that 
connects the components. These streams are determined as the least fixpoint for 
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the functions associated with each of the components. In the case of the scheduler, 
the choice of the function representing the behavior of the scheduler is constrained 
by specific properties of the streams that form the fixpoints. This modelling tech- 
nique, where there is a dependency between the selected behavior function and its 
actual input stream, is called "input choice specification" and is described in detail 
in [Broy 93]. 

Causality is a decisive notion in information processing and message passing 
systems. It imposes a "physical law" on the flow of information. Roughly speaking, 
we understand by the phrase "action a is causal for action b" that b cannot take place 
before a has happened. In distributed systems that consist of a set of components 
that exchange messages via channels between the components, we distinguish two 
forms of causality: 

• component causality: a component cannot issue an output message before it 
has received the input required for computing the content of the output mes- 
sages. This form of causality between input and output for a component is 
captured in the functional model by the monotonicity constraint. 

• information exchange causality: a message is not received before it has been 
sent. This is captured in the functional model by the least fixpoint principle 
for the recursive stream equations for the channels. 

These two forms of causality are an integral part of the functional system model. 
They are the basis for ruling out "causal loops". 

3 The Memory Architecture 

We formalize the requirements for the behavior of the memory by characterizing 
the relation between its input streams and its output stream. The requirements for 
the memory fall into the following two categories: 

• Proper memory access scheduling: the memory requests of the individual 
processors are rescheduled and merged; the scheduling is restricted with re- 
spect to memory barriers and memory instruction barriers as well as read and 
write requests for the same location. 

• Read/write consistency: in response to read requests, those data elements are 
sent that have been written by the most recent write requests for that location 
in the access stream. 
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Both requirements are described in [AARM 92] in a semiformal way that does not 
provide explicit answers to a number of critical questions. For instance, it is not 
clear whether so-called causal loops are implicitly excluded. Certainly they are not 
explicitly excluded in [AARM 92]. We shall come back to the issue of causal loops 
after we have introduced the mathematical model. 

To keep the model simple and to achieve a good separation of concerns, we de- 
compose the function md modeling the behavior of the memory into two prefix con- 
tinuous stream processing functions, the memory scheduler 

ms : (PRC -> REQ W ) -> REQ W 

which takes care of the proper memory access scheduling and determines the access 
stream and thus the access order, and the memory manager 

mm : REQ W -> RFR a 

which models read/write consistency (see Figure 3). The function md then is simply 
obtained by composing ms and mm. Mathematically we specify the result stream 
of the function md for all the streams of memory requests r : PRC — >• REQ M by 
the following equation: 

md(r) = mm(ms(r)) 

Precise specifications of the functions ms and mm are given in the following sec- 
tion. 

As shown in the data flow diagram given in Figure 3, we specify the history 
of messages between the memory scheduler and the memory service by an access 
stream 

ct e REQ W 

and the history of messages between the memory service and the processors by a 
stream of read and instruction fetch response messages 

cv eRFR a 

The stream ct denotes sequence of scheduled requests for the memory and deter- 
mines the outcome of the function ms. Mathematically, this is expressed by the 
following equation: 

ct = ms (cr ) 

The stream ct denotes the access order of the memory requests. 

The stream cv denotes the outcome of the function mm. Mathematically, this 
is expressed by the equation: 

cv — mm(ct ) 
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1st ; \ 2nd -> 


P : I?x 


P :R?x 


P : W (x, b) 


P : MB 


P : 1MB 


P : I?x 


< 




< 


< 


< 


P :R?x 




< 


< 


< 


< 


P :W(x, a) 




< 


< 


< 


< 


P : MB 




< 


< 


< 


< 


P : 1MB 


< 


< 


< 


< 


< 



Figure 4: Table of relations between requests in the issue order and the access order 

^From the stream cv the input streams as p € RFR w of memory responses for the 
individual processors P e PRC can easily be computed. 

3.1 Scheduling the Memory Requests 

We give the specification for the function ms by the relation between the streams 
cr p and ct . The function ms merges and reschedules its input streams. This merge 
and rearrangement follows the rules described in [AARM 92]. In this section we 
formalize these rules. 

The table given by Figure 4 is taken from [AARM 92]. It shows the reordering 
restrictions that are imposed by the issue streams cr P onto the access stream ct. 
The table expresses the following requirement in terms of the mathematical model: 
for every pair of memory requests ci , ci for process P for which we have c\ < C2 
in the table in Figure 4, their relative order in the issue stream cr p and in access 
stream ct coincide. Mathematically we write 

c\ < C2 =>• [ci < £2] in [crp, ct ] 

Here (for arbitrary streams r, t ) the proposition 

[ci < C2] in [r, t] 

stands for the following two conditions: condition (1) expresses that all requests c\ 
and C2 in r are eventually scheduled in t , and only requests that have been issued are 
scheduled. Its formalization is rather straightforward. Every memory request c in 
each of the issue streams r also appears in the access stream t . Since all requests are 
labeled by the processors, all elements in issue streams of different processors are 
distinct. Mathematically expressed condition (1) reads as follows (for the definition 
of the filter function "x |m" see the appendix B): 

r\[c,} = t\ lc ,} for ie {1,2} 
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Condition (2) expresses that the requests c\ compared to the requests c 2 may not 
come later in the stream t than in stream r. In other words, the scheduler may not 
move the requests c 2 to the left over the requests c\ to obtain the substream of the 
requests c\ and c 2 in t from that in r. Mathematically expressed, condition (2) reads 
as follows: 

Vk e N:h k (r)\ [ci] £h k (t)\ [ci] 

where 

h k {s) = 0| {C1 , C2 ))[1 : k] 

Here we use the following notation. For a stream s we denote by s[l : k] the first k 
elements of the stream. If a stream s has less than k elements then s [1 : k] — s. 

The relationship between the issue streams and the access stream as formalized 
above has the following consequence: for every set M of requests of the processor 
P that are all pairwise in the <-relation, the substreams of the requests in M in the 
issue streams and the access stream are identical. Mathematically expressed: 

(Vc, d e M : c < d v d < c) =^ cr P \ M = ct \m 

This can be shown for finite streams cr p and ct by induction on the length of the 
stream ct. For infinite streams it follows by the continuity of the function that filters 
out a substream. 

Based on the notation introduced above, we can now formulate the correctness 
requirement of the scheduling function. We decompose the requirements for the 
scheduler into the safety and liveness properties. In a first step we give just the 
safety property. It is represented by a simple predicate characterizing the set of 
scheduling functions that are correct with respect to safety. The liveness condition 
for the function ms is not a simple predicate on ms, but depends also on the partic- 
ular request streams in the data flow model. A function 

ms : (PRC -> REQ W ) -> REQ W 

is called a safe scheduling function if for all request streams r e (PRC — > REQ m ) 
for which rp contains only requests issued by processor P we have 

3t e REQ W : ms(r) c t A 
Vci , c 2 , P : ci < c 2 => [ci < c 2 ] in [r P , t] 

This requirement is a safety condition for the scheduler. It expresses that the re- 
quests are in a proper relationship, if they are scheduled at all. It does not express 
the liveness property that all requests are eventually scheduled. The liveness con- 
dition for the scheduler is not a simple predicate on ms, but depends also on the 
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particular request streams in the data flow model. It will therefore be added as a 
constraint for the scheduler functions ms with respect to the stream cr in the net- 
work modeling the Alpha architecture in section 5 where we summarize the model. 

3.2 Responding to the Scheduled Memory Requests 

The behavior of the memory manager is represented by the function mm, which ex- 
ecutes the memory requests in the order produced by the memory request scheduler 
ms. Its specification is rather simple. In response to each read request the memory 
manager sends the data element that has been written by the most recently executed 
write request. According to the rescheduling of requests, the execution order is not 
necessarily identical to the issue order. 

We ignore here the possibility that the memory may be initialized by other means 
than memory write requests . Explicit initialization can easily be included, however. 

For our mathematical model, we express read/write consistency by the follow- 
ing formula: for all streams u, w e REQ*, s e REQ W we assume: 

(u~w)\ RI = () A w\ W (x) = 0 
=>• 

fflm(n"Q : W (x, a) ^w^P : RHx^s) — P : R (x, a) "mm(i("Q : W (x, a) ^w^s) 

A 

mm(«"Q:W (x, a) "w^P : I?x^~s) — P : I (x, a) "mm(H"Q:W (x, a) ^w^s) 

where W(x) is the set of all write requests for location x: 

W(x) = {P:W(x,a) e REQ : P e PRC A a e DATA} 

and RI is the set of all the read requests and instruction fetch requests: 

RI = {P:R?x : P e PRC Axe LOC} U {P : I?x : P e PRC Axe LOC} 

The stream cv is the result of applying the memory function mm to the stream ct of 
memory requests. 

cv = mm(ct ) 

The stream cv of read request and instruction fetch responses produced by the mem- 
ory manager mm can easily be decomposed into one memory request response 
stream csp for each processor P. We specify: 

csp = Dp(cv) where Vi> : Dp(v) = v\r(p) 
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where the set R(P) of read request responses of the processor P is specified as fol- 
lows: 

R(P) = {e e RFR : processor{e) = P] 

Of course, there are other ways to decompose the memory function md. For in- 
stance, we may introduce a merge function that interleaves all streams of read re- 
quests produced by the processors in a stream of read requests. Using this merge 
function we can then either introduce for each processor an individual memory 
scheduler that reschedules all memory requests before they are merged, or merge 
all streams of read requests produced by the processors in a stream of read requests 
and then do the memory scheduling by rescheduling this stream. 

4 The Processors 

The behavior of a processor is modeled by a prefix continuous stream processing 
function. In this section we show only schematically how we model the instruction 
cycle of an Alpha processor by such a function. 

A processor has a local state that consists of all the entries in its registers and 
maybe additional information. The set of states of a processor is denoted by PRC- 
State. Initially, a processor starts from an initial state by issuing a finite sequence 
of memory requests. 

Whenever a processor receives a memory request response, it changes its local 
state and issues a finite (possibly empty) sequence of memory requests. For formal- 
izing this behavior of a processor, we introduce the following two functions: 

mr : PRCState x RFR ->■ REQ * 

sc : PRCState x RFR PRCState 

The function mr yields the sequence of memory requests issued by the processor 
in a state when receiving a memory response. The function sc yields the successor 
state of the processor. The behavior of the processor P is defined by the function 
fp . We specify this function by the following equation: 

fp(s) = init^exec(ap, s) 

where op is the initial state of the processor P, init is the initial sequence of mem- 
ory requests, and exec is the function 

exec : PRCState x RFR a -> REQ a 
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specified by 

exec(a, c^s) = mr(a, c)^exec(sc(a, c), s) 

Of course, a processor P issues only memory requests labeled by P. 

We do not give a more detailed description of the individual instructions and 
their execution here. We come back to this issue in section 6. 

5 Summary of the Model 

In this section we summarize the functional model of the Alpha architecture as a 
reminder for the readers. 

As indicated in Figure 3, we assume that the streams cr p , csp and ct are the 
least frxpoints of the following equations: 

crp = fp{csp) for all processors P e PRC 

csp = cv\r(p) for all processors P e PRC 

cv = mm(ms(cr)) 

where for all processors P e PRC the function fp and also the functions mm and 
ms are stream processing functions. This means in particular that they are prefix 
continuous. The function fp is assumed to be a processor behavior. The function 
ms is required to be a safe scheduling function. The function mm fulfills the fol- 
lowing requirement: for all streams u, w e REQ*, s e REQ a : 

u~w\ RI = () A w\ W(x) = () 

=>■ 

mm(«"Q : W (x, a) ^w^P : RHx^s) — P : R (x, a) "mm(«"Q : W (x, a) ^w^s) 

A 

mm(i/"Q:W (x, a) ^w^"P : I?x^s) — P : I (x, a) ^mm(u^Q : R (x, a) "~uw) 

For the function ms we require the following liveness condition: every request c is 
eventually scheduled, mathematically expressed: 

crp\ [c] = ms(cr P )\ [c] 

This is a liveness condition that restricts the choice of the scheduling function ms 
in addition to the safety properties required for ms making sure that all requests are 
eventually scheduled. 

The prefix monotonicity requirement for the functions models the causality 
within the processors and thus the causality within their programs. The least fix- 
point property of the streams described by the recursive equations models the causal- 
ity between the sending and receiving of messages. If either of these requirements 
is dropped, then causal loops are no longer excluded. 



12 



6 The Model at Work 



In this section we start with a short analysis of the functional model and then show 
how it can be used to formalize further properties of the processors and their exe- 
cution of instruction streams. 

6.1 Mathematics of the Model 

The model given in the previous sections is based on the following mathematical 
concepts: 

• prefix continuous stream processing functions, 

• recursive stream equations and least fixpoint interpretations for them, 

• liveness constraints for the scheduling function. 

These concepts are well suited as a mathematical basis for data flow models. Hard- 
ware systems can also be understood as data flow systems. A large number of ex- 
amples have demonstrated that the concepts work well for both. 

The purpose and benefits of mathematical or formal models for information 
processing systems are manifold: 

• Mathematical system models provide a consistent and precise description of 
the properties of a system, but nevertheless give freedom by leaving certain 
aspects deliberately unspecified; we speak of underspecification. 

• In the process of deriving mathematical system models from informal de- 
scriptions, flaws, inaccuracies and omissions can be detected and clarified. 

• Mathematical system models provide a reference basis for understanding and 
discussion. 

• Mathematical system models provide a basis for a formal reasoning about a 
system, by which specific properties can be derived (and therefore verified). 

• Mathematical system models provide a precise requirement specification for 
implementations by hardware or software systems. 

The functional model given for the Alpha AXP architecture exhibits a number of 
typical properties. For instance, every response in the response stream is triggered 
by a request. This property is formalized by the following definition. 
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Definition 1 (Feasible Response Streams) A response stream s e RFR w is called 
feasible for a processor with behavior fp , if all the instances of responses in s are 
triggered by requests in the request streams fp (s). A response stream s is triggered 
by the request stream fp (s), if every prefix s of the stream s contains for every in- 
stance of a read request and every instance of an instruction fetch request in fp (s) 
at most one corresponding response. Mathematically expressed, a stream s is a fea- 
sible response stream for the behavior fp of processor P, if for all its prefixes s c s 
the following two conditions are fulfilled: 

#^I{P :R?x} < #fp(s)\ R (x) 

and 

# ^l{P:I?x} <#/> (■?)!/(*) 

where 

R(x) = {P:R(x,a) : a e DATA} 
I(x) = {P:I (x,a) :aeDATA] 

We then write: 

fp s 

If the inequalities above are strengthened to equalities for s = s, then the stream s 
is called a complete and feasible response stream for the processor behavior fp . 

□ 

It is a straightforward exercise to show that in our model for the Alpha architec- 
ture according to the definition of the memory device for every processor P, every 
stream cs p is a feasible response stream for fp . 

6.2 Causal Loops 

In this subsection we study the problem of causal loops. It is widely accepted that 
there is a natural causality flow in information processing systems. More techni- 
cally speaking, a particular message value cannot be sent by an interactive message 
passing system before all values on which it depends have been received. 

In the Alpha architecture there are two kinds of processing units, the processors 
and the memory. The principle of causality can be applied to both of them. A pro- 
cessor cannot issue a memory write request before it receives the data to be written. 
The memory cannot respond to a read request before it receives the memory write 
request that supplies that data. 
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To make our discussion more concrete, let us look at a simple example. We as- 
sume that the processors P and Q execute the following two little programs. We use 
some pidgin assembler language here which has the two commands LD (for load) 
and ST (for store) only and works with the local registers of the processors. The 
registers are denoted by Rl, R2, etc. Consider for the processor P the following 
program 

LD Rl x 
ST Rl y 

and for the processor Q the following program 

LD R2 y 
ST R2 x 

Let us assume that initially the value 0 is stored both in location x and in location 

y- 

Of course, we may expect that, independent of the scheduling, the effect of ex- 
ecuting these programs is that both the locations x and y invariantly have the value 
0. Now let us consider the following access stream 

P:R?x Q:R?y P:W(y,l) Q:W(x,2) 

and the corresponding response stream: 

P:R(x,2) Q:R(y,l) 

This sequence of actions is certainly read/write consistent (only values are read that 
have been written before). It also fulfills all the requirements of memory access 
scheduling. So one may argue that it is a feasible access stream according to what 
is required in [AARM 92] for the memory. However, if we consider in addition 
the processor functions f P and /q , we realize that this access order violates the 
causality requirements of the processors. For the processor P the write request can 
be issued only after the response to the read request has been received. 

One may argue whether the paradoxical behavior of the causal loop as demon- 
strated above is actually admitted or not by the [AARM 92]. Such an exegesis is not 
very productive, however. As soon as one assumes a proper causality flow for the 
processors, causal loops are ruled out anyway. We claim that to any realistic pro- 
cessor, whatever advanced concepts it includes, the law of causality flow applies. 
Therefore, for any hardware, causal loops are excluded. 
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6.3 Sequential Execution of Processors 

We do not want go into the definition of all the details of the execution of particular 
instructions in the Alpha AXP architecture. We therefore introduce as a reference 
for the behavior of the processors the behavior of a processor with a conservative 
sequential execution strategy. Such a behavior is obtained, if the processor sequen- 
tially executes the classical instruction cycle. This cycle first fetches the instruction 
indicated by the program counter, then computes the operand address and reads the 
operand from the memory or writes a value into the memory or issues a barrier re- 
quest and then starts the cycle again. 

This sequential execution strategy corresponds in our model to a particular stream 
processing function for the processors P e PRC which we denote by 

fp q :RFR W ->• REQ W 

This function will be used as a point of reference for formalizing the correctness of 
the behaviors of processors with a nonsequential execution strategy in the follow- 
ing. 

6.4 Speeding Up Executions 

In a most conservative implementation every processor issues just one memory re- 
quest, then waits until it gets the response to this request, and only then issues the 
next request. This is the behavior represented by the function f s p eq . 

In contrast to this conservative sequential behavior, a more aggressive processor 
may send several memory requests before it receives some of the responses from 
the memory. This can lead to a speed-up in the interaction between the memory and 
the processor. We call such behavior, where several requests can be issued before 
a response is received, issuing lookahead requests. 

We distinguish between the following strategies of processors in issuing looka- 
head requests. 

If a processor issues only memory requests whose responses are certainly needed 
for the execution, we speak of issuing conservative lookahead requests. In this 
case, any missing response from the memory will eventually bring the processor 
to a waiting state. 

A lookahead request processor may even issue memory read requests or instruc- 
tion fetch requests whose responses it might not need. We call this strategy issuing 
speculative lookahead requests . It may lead to a considerable speed-up in accesses 
to the memory. The price is that some requests may be processed that turn out to 
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be unnecessary. Clearly speculative write requests are not a safe concept and not 
considered therefore. 

A further optimization can be obtained in cases where it can be recognized in 
advance that responses to certain read and instruction fetch requests that appear in 
the strictly sequential behavior are irrelevant for the further course of computation. 
Such requests need not even be issued. We call this shortcut optimization. One 
may argue that in a well-written program such irrelevant requests should not occur. 
However, even when doing multiplication of two values read from the memory, one 
of the values may turn out to be not relevant, if the other one is zero. 

Shortcut optimization leads to a more radical change in the behavior of pro- 
cessors than lookahead optimizations. Responses to read and instruction fetch re- 
quests are generally needed by a processor to get the information (instructions and 
operands) required to continue its execution. Moreover, the arrival of responses is 
used to trigger further requests. This is most obvious in the conservative sequential 
behavior. The processor issues a request only when the response to the request is- 
sued previously has been received. So there is a causal relation between responses 
and the following requests. In lookahead optimizations, read and instruction fetch 
requests are issued earlier and more of them may be issued, but at least all the re- 
quests that appear in the nonoptimized behavior are eventually issued. Write re- 
quests and barrier requests are issued before the corresponding responses are re- 
ceived only as long as there are no actual data dependencies. In shortcut optimiza- 
tions fewer requests may be issued by avoiding ones whose responses would be 
irrelevant. This may change the causality flow of a processor more radically. 

It is one of the basic ideas of the Alpha AXP architecture that processors may 
issue lookahead memory requests, in order to speed up the general execution by 
parallelizing the memory accesses. In the following section we give a definition 
of the properties required for a processor to make sure that the replacement of a 
sequential processor by a processor with lookahead and shortcut optimizations does 
not change the effects of the executed programs as long as the program runs without 
any access to shared memory. 

6.5 Optimizations of Processor Behavior 

In this section we define the concept of valid optimization of the behavior of proces- 
sors. Not all read request responses in a response stream coming from the memory 
are actually needed for further computation by the processor. Some read requests 
may serve only lookahead purposes and the responses to those might turn out to be 
obsolete after they have been requested. Similarly, certain responses may be irrel- 
evant, since the transmitted values do not really influence the further computation. 
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For instance, if a value of a location is requested from the memory and later the 
transmitted value is multiplied by 0, its value is certainly not relevant. Of course, 
in practice it is very difficult to determine whether a response is relevant, since ir- 
relevant responses may even trigger further read requests that lead to responses that 
turn out to be irrelevant, too. Optimized processors nevertheless may make use of 
the fact that a response is irrelevant. In this case a response might even not be re- 
quested. 

Request streams have two effects on the memory: write requests change the 
state of the memory locations and therefore can be effectively observed by other 
processors, read and instruction fetch requests trigger responses by the memory and 
this way allow the processor to observe the current state of locations. Memory and 
instruction barrier requests restrict the memory access order. So it is not important 
for the effect of a request stream on the memory, how many barrier requests are 
created, but only how they restrict the access order. This leads us to the following 
definition. 

Definition 2 (Feasible Schedulings of Memory Request Streams) For a request 
stream r\ e REQ W of processor P e PRC we call a request stream r 2 e 
(REQ "{P : MB, P : IMB}) m a feasible scheduling for the stream r\ , if there exists a 
stream ro e REQ w such that for all requests c\,c 2 e REQ we have: 

c\ < c 2 =» [c\ < c 2 ] in [n,r 0 ] 

and 

r 2 =r 0 \ M where M = REQ"{P : MB, P : 1MB} 

□ 

For the correctness of optimizations it is decisive to identify under which conditions 
two request streams have the same effects for the memory such that they lead to the 
same set of possible observations by the processors. We may change the behavior 
of a processor such that it issues a different request stream as long as this is ob- 
servably equivalent to the previous request stream. Along these lines we define the 
effective equivalence of request streams. The memory barrier and the instruction 
fetch barrier requests restrict only the rescheduling of the memory requests. Two 
request streams of a processor P are considered to be effectively equivalent, if they 
lead to the same effects and therefore can be called observation equivalent. 

In the following definition, we do not require that effectively equivalent request 
streams have exactly the same substreams of barrier requests. We just require that 
they have the same read, instruction fetch and write requests and they include barrier 
requests that impose the same scheduling restrictions for them. 
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Definition 3 (Effective Equivalence of Memory Request Streams) Two 

request streams r\,ri e REQ m are called effectively equivalent for processor P, 
if their sets of feasible schedulings coincide. We write then r\ ~ r 2 . 

□ 

To have a basis to speak about the substreams of relevant and irrelevant responses, 
we use the concept of the decomposition of streams. 

Definition 4 (Decomposition of Streams) For an arbitrary stream s e M w two 
streams s\ and s 2 are called a decomposition of s, if the stream s can be split into 
the substreams s\ and s 2 . Mathematically expressed, if there exists an oracle f3 e 
{1,2}°° such that fori e {1,2} 

Si = disi{s, ft) 

where the functions disj : M w x {1, 2} m — > M w are specified by 

disiim^s, i^fJ) — m^disi(s, fi) 

and 

i j =^> disi{m^s, j"~f}) — dist(s, f3) 

□ 

Based on this definition we can now define when for a processor a behavior is a 
lookahead optimization and when it is a shortcut optimization of another behavior. 

Definition 5 (Lookahead Optimization) We consider the two behavior functions 

fi,f 2 :RFR <0 ^REQ a > 

for the processor P. The behavior fa is called a lookahead optimization of fa , if, 
for every response stream s 2 e RFR W that is feasible for fa, the following condi- 
tion holds: there is a decomposition of S2 into a response stream s\ € RFR W and a 
response stream u \ such that s\ is a feasible response stream for fa and the request 
streams fa{s\) and fa{s 2 ) are effectively equivalent. 

□ 

Note that by the definition above, the processor with behavior fa is a refinement of 
the processor with behavior fa since every behavior that fa shows is an optimized 
behavior of a behavior of fa . 

Shortcut optimizations are more difficult to define. In a shortcut optimizations 
certain requests are not issued. This can be done if the effects achieved that way are 
equivalent to a behavior of the architecture for the nonoptimized issue stream. 
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Definition 6 (Shortcut Refinement of Issue Streams) We define a relation 

^:REQ W x REQ W 

that defines the allowed shortcut optimizations. We give axiomatic rules: 

(Vci, c 2 : c\ < c 2 [c\ < c 2 ] in [n, r 2 ]) =>• n ^ r 2 
ri -tr r 2 A r 2 -^ r 3 ->• r\ r 3 
r p MB — M B— r2 ^ rpMB"Y 2 
rpp : W (x, a) T : W (x, b) "~r 2 ^ rfP : W (x, b) "~r 2 
rfP : R?x~r 2 ^ r^r 2 

□ 

This definition essentially expresses that in a shortcut optimization we may leave 
out superfluous write requests, read requests and memory barriers. Of course, a read 
request may not be left out if the requested value is needed by the processor. 

Definition 7 (Shortcut Optimization) Let us consider the two behavior functions 

fufi :RFR a -+REQ" 

for the processor P. The behavior f 2 is called a shortcut optimization of f\ , if, for 
every response stream s 2 € RFR w that is feasible for f 2 , the following condition 
holds: there exists a response stream si that is feasible for f\ such that f\ (s\) ^ 

f(S2). 

□ 

The definition essentially says that in a lookahead optimization, for every response 
stream feasible for f 2 we can find a response stream for f\ such that the response 
streams coincide after we get rid of some irrelevant responses for f 2 and the remain- 
ing requests are effectively equivalent. 

A shortcut optimization can lead to a processor behavior with a quite differ- 
ent processor causality. This allows sophisticated optimizations where certain re- 
quests and the corresponding responses are recognized as unnecessary and therefore 
avoided even though they are relevant in the control flow of f\ , since there they trig- 
ger further relevant requests. 

Conservative sequential execution is very restricted. In every state of the pro- 
cessor, the number of issued read and instruction fetch requests is at most one larger 
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than the number of received responses. Mathematically, for a processor P with the 
behavior function f s p eq the following property holds: 

s C cs P =>■ 5 = cs P v 1 + #£ = #fp q (5)\ RI 

where /?/ is the set of read and instruction fetch requests. To get a less restricted 
behavior, we allow that certain requests are issued earlier. We extend the notion of 
processor correctness by shortcut and lookahead optimization. Based on the con- 
cept of sequential behavior and optimization, we can now define a notion of cor- 
rectness of processor functions. 

Definition 8 (Correct Processor Function) A processor P with processor func- 
tion 

f P : RFR M -> REQ M 
is called correct, if it is a shortcut optimization of a lookahead optimization of f p eq . 

□ 

Based on this definition, it is possible to prove that the observable behavior of the 
Alpha AXP architecture does not depend on the particular choice of the processor 
functions as long as all of the functions are correct. 

Theorem 1 (Scheduling Robustness) When all processors P e PRC execute the 
sequential behavior represented by f s p eq , the access stream is effectively equivalent 
to the access stream obtained, as long as all processor functions are correct, 

- the processors do not share any memory, and instruction fetch locations and 

- write locations are disjoint 

Sketch of Proof: Since every processor function is correct, we can assume a re- 
sponse stream s P and a request stream rp corresponding to a lookahead optimiza- 
tion for the response stream s s p eq and the request stream r s p eq that we obtain for the 
sequential behavior such that the following holds: we can decompose the response 
stream sp and the request stream rp on one hand into substreams of the response 
stream as p and the request stream cr p that we obtain for the considered behav- 
ior. On the other hand, we can decompose the response stream sp and the request 
stream rp into substreams of the response stream s' p q and the request stream r' p q 
that we obtain for the sequential behavior. Hence the request streams r p q and re- 
quest streams cr p are effectively equivalent. 
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□ 



The theorem shows that, although different scheduling strategies can be used as well 
as different strategies for read and instruction fetch lookahead, as long as shared 
memory is not used and no writes occur to locations that occur in instruction fetch 
requests, the produced sequences of memory states for each of the processors coin- 
cide. 

6.6 Assumptions about Executions 

We have given only a very schematic description of the behavior of the processors, 
so far. In this section, we introduce a fundamental assumption about the behavior 
of processors. In a first definition, we introduce the notion of the equivalence of 
response streams. 

Definition 9 (Equivalence of Response Streams) Two memory response streams 
S\ , S2 e RFR are called equivalent and we write: 

Si ~ S2 

if read response messages and instruction fetch response messages for the same lo- 
cations are identical and in the same order: mathematically expressed, if for all lo- 
cations x, the following proposition holds: 

^llL(x) — S2\L(x) 

where 

L{x) = {P:R(x,a) : a e DATA } U {P : I (x, a) : a e DATA} 

□ 

Based on the notion of equivalence of response streams, we next define what it 
means that a processor is robust against reorderings of its response stream. 

Definition 10 (Response Delay Robustness) For a processor P, a processor func- 
tion 

f P : RFR M -> REQ W 

is called response delay robust if, for all short cut optimizations g of fp and all 
response streams s\ and S2 that are feasible for fp and for which s\ ~ S2 holds, we 
have: 

g(si) ~ g(s 2 ) 
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□ 

This leads to the following basic assumption about the behavior of processors: 

• Assumption: Response Delay Robustness: for every processor P, its behav- 
ior function fp is response delay robust. 

Response delay robustness makes sure that the behavior of a processor does not 
critically depend on the order in which its memory requests are executed, as long 
as the scheduling constraints are fulfilled. 

Let us next briefly analyze how significant the requirement is that the streams 
csp,crp,ct, cv are least fixpoints. To answer this question, we also ask whether 
there exist solutions to the recursive equations for these streams that are not least 
fixpoints. 

To be able to answer this question we need a further assumption, however. We 
did not say anything about the behavior of a processor in the case where it gets a re- 
sponse for which it did not send a request. We may assume, however, that a proces- 
sor that has terminated its execution and comes to a halt does not issue any further 
requests even when it receives further (unrequested) responses. 

Definition 11 (Response Satisfaction Property) A processor P with the behavior 
function f P fulfills the response satisfaction property if, for all response streams 
s e RFR , the following proposition is fulfilled: 

fp <-+p s A s complete for fp A s rz s =>• fp (s) — fp (?) 

This proposition expresses that the processor P with behavior fp does not issue fur- 
ther memory requests after all its requests have been satisfied even if it gets further 
memory responses that it has not requested. 

□ 

By assuming the response satisfaction property for all processors, we can prove that 
all fixpoints of the recursive equations are unique. 

Theorem 2 (Uniqueness of Fixpoints) Let us assume the response satisfaction 
property for all processors. If the streams cs P , cr p, ct , cv are fixpoints of the defin- 
ing equations and fulfill the constraints, then they are least fixpoints. 

Proof: Assume the streams csp,crp,ct, cv fulfill the equations and constraints 
listed in section 5, but are not necessarily least fixpoints of the equations. Assume 
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further that the streams 5p,rp,t,v also fulfill the equations and constraints for the 
same functions fp,ms, md and furthermore: 

Sp c csp Arp c cr p A t c A 5 C ct 

The streams cj p are feasible responses for the streams fp . Formally expressed, 
we have fp <-+p cs p and cs p is complete for fp . According to the satisfaction 
property of the processors from sp C cs p and fp <-+p csp we may conclude 
rp — cr p . By straightforward equational reasoning we obtain 

t — Ct A V = CV ASp=CSp 

This shows that every set of streams that fulfills both the equations and the con- 
straints is a least fixpoint as long as we assume the response satisfaction property 
for all processors. 

This theorem is of some importance for proofs about the execution of Alpha pro- 
grams using the functional model. It indicates that in proofs we do not have to rely 
on least fixpoint properties, but we may just work with fixpoint properties. Hence 
we can work with purely equational reasoning. 

The theorem does not say that there is only one fixpoint, it says that there is 
only one fixpoint for each feasible scheduler function ms. Due to the underspeci- 
fication in the scheduler, there are many different scheduler functions ms that may 
have different fixpoints. 

7 Locking 

In order to synchronize programs properly, we need more sophisticated concepts 
than the ones treated so far. For the Alpha architecture load locked instructions and 
store conditional instructions are available. So far we have not said anything about 
locking. In this section, we briefly show how loads with locks and conditional stores 
can be treated in our model. 

7.1 Extension of the Model to Locking 

A locked load action is represented by P : K ( x , a ) . Roughly speaking, a locked 
load is a read action that in addition sets a lock flag. A following conditional store 
succeeds, only if the lock flag is set. A lock flag is also cleared, if a write request is 
executed for the location for which the lock flag was set. A successful conditional 
store request is a write request. 
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Action name 


Syntax 


Memory Request 


Response 


Load locked 

Store conditional successfully 
Store conditional failed 
Store conditional failed 


P :K (x, a) 
P : S (x, a) 
P : S (x, a) 
P : S (x, 0) 


P :K?x 
P : S (x, a) 
P : S (x, a) 
P : S (x, 0) 


P:R(x,a) 
P : S (x, L) 
P : S (x, 0) 
P : S (x, 0) 



Figure 5: Table actions for locking and their split into requests and responses 



Conditional stores can be successful or they may fail. A successful conditional 
store action is represented by P : S ( x , a ) . A failed conditional store action is repre- 
sented by P : S ( x , 0 ) . A memory request issued by the processor P for executing 
a locked read on location x is represented by P : K?x. A memory request issued 
by the processor P for executing a successful conditional store is represented by 
P : S ( x , a ) . A memory request issued by the processor P for executing a failed 
conditional store is represented by P : S ( x , 0 ) . 

It looks strange that a processor can issue a request for a failing store action, 
but this is used to model the following situation. Let us assume that the instruction 
stream contains a conditional store instruction. When the processor executes this 
instruction, the failure of the corresponding conditional store may depend on the 
situation inside the processor. Then the processor issues an instruction store request 
that is condemned to failure. 

Memory requests issued by the processor P for executing locked read or con- 
ditional store instructions both require responses. A response to a memory request 
issued by the processor P for executing a locked read on location x is represented 
by P : R ( x , a ) . A response to a memory request issued by the processor P for ex- 
ecuting a conditional store is represented by P : S ( x , L ) , if it was successful. A 
memory request issued by the processor P for executing a conditional store is rep- 
resented by the response message P : S ( x , 0 ) , if it failed. 

The table given in Figure 5 shows these additional actions and their decompo- 
sition into memory requests and responses to them. 

Let us assume from now on that the sets of requests REQ and responses RFR 
also contain these additional requests and responses. A load locked request behaves 
like a read request, but in addition may interfere with the execution of conditional 
store requests. The conditional store request is like a write but, since it may fail its 
failure or success needs to be indicated to the processor. With respect to the memory 
scheduling there is no difference between locked load and read or between condi- 
tional store and write. There is a difference, however, with respect to the read/write 
consistency. 
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For convenience, when writing the specification, we use an additional artificial 
request message with syntax P : K ! x. It stands for locked loads in the access stream 
that have been executed already. The execution of a locked load request P : K?x in 
the access stream leads to a read response P : R ( x , a ) with the most recent written 
value a . In the access stream, the marker P : K ! x indicates the position in the access 
stream, in which the lock has been executed. All other locks of processor P get 
cleared. 

In our mathematical model, we express read/write consistency for locked loads 
by the following formula: for all streams u, w e REQ*, s e REQ W we assume: 

(u~W)\risk = 0 A w\ W (x) = 0 A w\ N0 L(P) = U> A U\ N0 L(P) = u 

=>■ 

mm(u"Q:W (x, a) "w^P :K?x^s) = 
P:R(x,a) "mm(ii"Q:W(x,a) ~u>~P : K ! x~s) 

where W(x) is the set of all the write requests for location x: 

W(x) = {P:W(x,a) eREQ : P e PRC Aa e DATA} 

and NOL(P) is the set of all the requests different from lock markers for processor 
P: 

NOL(P) = REQ"{P:K\x eREQ : x eLOC} 

and RISK is the set of all requests that require responses, namely the set of all the 
read requests, instruction fetch requests, store conditional requests, and load locked 
requests: 

RISK= {P : R?x e REQ : P e PRC Axe LOC } U 
{P:I?x eREQ : P e PRC Ax eLOC}U 
{P:K?x e REQ : P e PRC Axe LOC}U 
{P : S (x, a) e REQ : P e PRC Axe LOC a a e DATA U {0}} 

This shows that a locked load request is processed like a read request, but after ex- 
ecution a marker is kept indicating the place in the access stream where the lock 
occurred. 

A conditional store on location y is successful if it is marked as successful by the 
issuing processor P and if there is an executed lock P : K ! x in the access stream and 
there are no writes to the location x between this lock and the conditional store in 
the access stream. The success of a conditional store is expressed by the following 
formula (with a e DATA ): 

(u~w)\risk - 0 A w\ W (x) — 0 A W\ K (P) = 0 A u\ N0 L(P) = U 
mm{u^V :K! x^w^P : S (y, a) — P : S (y, L) "mm(ii"ii;"P :W (y, a) ^s) 
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where K(P) is the set of all the lock markers for the process P: 

K(P) = {P:K!x e REQ : x e LOC} 

and W(x) is the set of all the write requests to location x as specified above. 

A conditional store to location x fails if there is a write to location x between 
the last locked load and the execution of the store request. 

(u~W)\risk = 0 A w\ W (x) + 0 A w\ K (P) = (} A u\ N0 L(P) = U 

mm(u^P : K ! x^io^P : S (y, a) ^s) = P : S (y, 0) "~mm(u'~'w'~~s) 

A conditional store also fails if there is no lock marker that was issued by the pro- 
cessor still valid. 

U\RISK — 0 A U\ K (P) = (} 
mm(u^P : S (x, a) ^s) — P : S (y, 0) ^mm(u^~s) 

The failure of a conditional store request marked as failing is expressed by the fol- 
lowing formula: 

U\RISK — 0 A u\ N0 L(P) = u 

mm(u"P : S (x, 0) "s) — P : S (x, 0) ^mm(Ws) 

Whether a conditional store request issued by a processor is marked as condemned 
to failure or as successful if not interfered with by another processor is left unspec- 
ified. In [AARM 92], page (I) 4-9, this is called unpredictable. In the functional 
model of a processor, it is part of the specification of the behavior of the proces- 
sor to describe under which conditions a conditional store request is condemned to 
failure or as successful. In the mathematical model these conditions can also be left 
unspecified, and thus unpredictable. However, more sophisticated fairness condi- 
tions can also be formulated. 

7.2 Analysis of a Simple Locking Discipline 

In this section we show how a simple mutual exclusion scheme does work with the 
described memory scheduling discipline. 

We consider the following mutual exclusion scheme as given in [AARM 92], 
page (I) 5-6. Every processor executes the following program, before it gets into 
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its critical phase: 



tryjagain : LDL 



Rl 



x 



< modify y > 



STC 
BEQ 



Rl 
Rl 



x 



no_store 



nostore : 



BR 



tryjagain 



Our specification allows us to conclude that the store conditional request can be suc- 
cessful only if it is issued as successful by the processor, and if in the access stream 
there is no write to location x between the load locked and the store conditional. 
The specification of the memory function allows us to prove that according to these 
facts this simple protocol works. 

8 Conclusion 

The purpose of this report is to give a mathematical model of the Alpha shared mem- 
ory system. It provides a consistent description of the Alpha shared system and of 
the locking rules as given in [AARM 92]. It is, of course, still a simplification, since 
it does not consider interrupts or exceptions. 

It is nevertheless helpful for analyzing some of the properties of the Alpha ar- 
chitecture and the programs that run on it. It moreover clarifies some issues not 
treated explicitly in [AARM 92] such as the treatment of causal loops. 

This report also shows the usefulness and flexibility of functional system mod- 
els. It demonstrates that functional system specifications can be used to describe 
system architectures and properties thereof of considerable complexity. 
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A Appendix: A More Liberal Scheduling Concept 

In the model given so far we have a coherent memory in the sense that all processors 
make observations that are consistent with the assumption of a sequential global 
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memory. Therefore, if two locations x and y that are initially 1 are updated to 2, 
then it is impossible that one process receives the response stream 

R(y,2) R(x,l) R(x,2) 

as reaction to its request stream 

R?y MB R?x MB R?x 

and the other process obtains 

R(x,2) R(y,l) R(y,2) 

as reaction to its request stream 

R?x MB R?y MB R?y 

This is excluded by the assumption of consistent updates of a global memory lead- 
ing to serializability for the access stream. 

We model a more liberal memory scheduling for the Alpha architecture by a 
data flow model as given in Figure 6. In this data flow model the memory is dis- 
tributed. Every processor has its own copy of the memory. The processors are con- 
nected to their memory by channels. The channels are denoted by csp , crp , and ct p 
where P denotes the corresponding processor. 

Each processor sends a stream of memory requests to the scheduler and receives 
a stream of read response messages from its memory. In a computation of the Al- 
pha architecture, the behavior of each processor P e PRC and the function mm 
are modeled as before. In a computation, a stream is associated with each of the 
channels cs P ,crp , and ctp. These streams will be denoted by cs p , cr p , and ct p 
respectively. 

We formalize the requirements for the behavior of the scheduler component by 
a function 

ms : (PRC -> REQ M ) -> {PRC -> REQ a ) 

which models the proper memory access scheduling and determines the access 
streams and thus the access orderings and the memory service 

mm :REQ a -> RFR a 

which models read/write consistency. The function md then is simply obtained by 
composing ms and mm. 
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Figure 6: Data flow graph of the Alpha architecture 

We give the specification for the function ms by the relation between the streams 
cr p and ctp. The function ms merges and reschedules its input streams. 
Two streams s\ and S2 are called consistent and we write 



Si M s 2 



if 



Sl E S2 V S2 E Si 

For the function ms we require the following safety conditions: 

• for all locations x the access streams are consistent for all processors P and 
Q: 

ms(r) P \ W(x) Mms(r) Q \ W ( x) 

where 

W(x) = {P:W(x,a) eREQ : P e PRC A a e DATA} 



• for each processor P, the restriction of the access order with respect to the 
issue order is obeyed in its access stream; for all r e (PRC ->• REQ W ) we 
have 

3t e REQ M : ms(r) P C r A 
Vci, c 2 e /?(P) : ci < c 2 =» [ci < C2] in [r P , ?] 



where 



= { c e REQ : processor (c) = P} 
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• for each processor P in its access stream ms(r)p only read requests for P 
occur. 



For the scheduler we require the following liveness condition: for any pair of mem- 
ory requests c± , C2 of processor P for which we have ci < ci in the table given in 
section 3, their relative order in the issue stream cr p and in access stream ms(cr ) 
coincide. Expressed mathematically the liveness condition is 

c\ < C2 =>■ [ci < C2]in[cr p, ms (cr) p ] 

This is a liveness condition that restricts the choice of the scheduling function ms 
in addition to the safety properties required for ms making sure that all requests are 
eventually scheduled. 

B Appendix: Mathematical Basis 

A stream represents a communication history for a channel. A stream of messages 
over a given message set M is a finite or infinite sequence of messages. We define 

M w = df M* U M°° 

We briefly repeat the concepts from the theory of streams that are used in the spec- 
ifications. More comprehensive explanations can be found in [Broy 90]. 

• By x^y we denote the result of concatenating two streams x and y. We as- 
sume that x^y = x, if x is infinite. 

• By () we denote the empty stream. 

• By ft (x) we denote the first element in a stream; if the stream x is empty, it 
is undefined. 

• By rt (x) we denote the stream obtained from x by dropping its first element; 
if the stream x is empty, the resulting stream is empty. 

• If a stream x is a prefix of a stream y , we write x c y . The relation c is called 
prefix order. It is formally specified by 

x != y =df 3ze M a : x~z = y 
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The behavior of deterministic interactive systems with n input channels and m out- 
put channels is modeled by (n, m)-ary stream processing functions 



f : (MT -> (M w ) m 
We use some notions from domain and fixpoint theory that are briefly listed: 

• A stream processing function is called prefix monotonic, if for all tuples of 
streams x, y € (M m ) n we have 

X C y f{x) H f(y) 

• By US we denote a least upper bound of a set S, if it exists. 

• A set S is called directed, if for any pair of elements x and y in S there exists 
an upper bound in S. 

• A stream processing function / is called prefix continuous if / is prefix 
monotonic and for every directed set 5 C M w we have: 

/(US) = u{f(x) :xeS} 

• A partially ordered set is called complete if every directed set has a least upper 
bound. 

The set of streams and the set of tuples of streams are complete. Note that every 
directed set of streams has a least upper bound. 

In specifications we use the filter function in infix notation. Let S be an arbitrary 
subset of the set M and x e M w be a stream over M. We specify: 

(m~x)\ s = x\ Mq <= -<m e S) 
(m~x)\ s = m~(x\ Mo ) <= m e S 
<>Im 0 = 0 

Furthermore we use the function 

# : M a -> N U {00} 
that yields the length of a stream. It is specified by (for m e M): 

#0 = 0 
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#(m^x) — 1 + #x 



We model the behavior of interactive components by sets of continuous (and there- 
fore by definition also monotonic) stream processing functions. Monotonicity mod- 
els causality between input and output. Continuity models the fact that for every 
behavior the systems reaction to infinite input can be predicted from the reactions 
of the component to all finite prefixes of this input 1 . Monotonicity reflects the fact 
that in an interactive system previous output cannot be changed when further in- 
put arrives. The empty stream represents the information "further communication 
unspecified". 

A specification describes a set of stream processing functions that represent the 
behaviors of the specified systems. If this set is empty, the specification is called 
inconsistent. If the set contains exactly one element, then the specification is called 
determined. If this set has more then one element, then the specification is called un- 
derdetermined and we also speak of under specification. An underdetermined spec- 
ification can also be used to describe hardware or software units that are nondeter- 
ministic. An executable system description is called nondeterministic, if it is un- 
derdetermined. Then the underspecification in the description of the behaviors of 
a nondeterministic system allows nondeterministic choices carried out during the 
execution of the system. In the functional modeling of interactive systems there is 
no difference in principle between underspecification and the operational notion of 
nondeterminism. In particular, it does not make any difference in such a framework 
whether these nondeterministic choices are taken before the execution starts or step 
by step during the execution. 



'This does not exclude the specification of more elaborate liveness properties including fairness. 
Note, fairness is, in general, a property that has to do with "fair" choices between an infinite number 
of behaviors. 
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